A Corpus of Word-Aligned Asked and Anticipated Questions in a Virtual Patient Dialogue System

نویسندگان

  • Ajda Gokcen
  • Evan Jaffe
  • Johnsey Erdmann
  • Michael White
  • Douglas Danforth
چکیده

We present a corpus of virtual patient dialogues to which we have added manually annotated gold standard word alignments. Since each question asked by a medical student in the dialogues is mapped to a canonical, anticipated version of the question, the corpus implicitly defines a large set of paraphrase (and non-paraphrase) pairs. We also present a novel process for selecting the most useful data to annotate with word alignments and for ensuring consistent paraphrase status decisions. In support of this process, we have enhanced the earlier Edinburgh alignment tool (Cohn et al., 2008) and revised and extended the Edinburgh guidelines, in particular adding guidance intended to ensure that the word alignments are consistent with the overall paraphrase status decision. The finished corpus and the enhanced alignment tool are made freely available.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Overhearing Dialogues and Monologues in Virtual Tutoring Sessions: Effects on Questioning and Vicarious Learning

This research used vicarious-learning procedures in an attempt to increase students’ knowledge of the domain and enhance the quality of their questions in an intelligent tutoringsystem involving computer-controlled animated agents. Students who overheard a dialogue in which a virtual tutee asked a virtual tutor questions during acquisition wrote significantly more in free recall and asked signi...

متن کامل

Using Virtual Social Networks and Social Tolerance among Youths in Tehran

Difference and diversity are phenomena existing from the very first days human communities have been formed. Under such condition, what provides for necessity of coexistence between different individuals and groups to become possible is tolerance. On the other hand, wide range development in terms of communication and providing information are going on in today’s world. This evolution called by...

متن کامل

Vocabulary Lists for EAP and Conversation Students

Despite the abundance of research investigating general and academic vocabularies and developing dozens of word lists, few studies have compared academic vocabulary with general service word lists such as conversation vocabulary. Many EAP researchers assume that university students need to know all the words in West’s (1953) General Service List (GSL) as a prerequisite to academic words (e.g., ...

متن کامل

Developing a Corpus-Based Word List in Pharmacy Research ‎Articles: A Focus on Academic Culture

The present corpus-based lexical study reports the development of a Pharmacy Academic Word List (PAWL); a list of the most frequent words from a corpus of 3,458,445 tokens made up of 800 most recent pharmacy texts including research articles, review articles, and short communications in four sub-disciplines of pharmacy. WordSmith (Scott, 2017) and AntWordProfiler (Anthony, 2014) were used to sc...

متن کامل

Lexical Structure for Dialogue Act Recognition

This paper deals with automatic dialogue acts (DAs) recognition in Czech. Dialogue acts are sentence-level labels that represent different states of a dialogue, such as questions, hesitations, ... In our application, a multimodal reservation system, four dialogue acts are considered: statements, orders, yes/no questions and other questions. The main contribution of this work is to propose and c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016